Neural networks with high order connections 



Jeferson J. Arenzon and Rita M.C. de Almeida 
Institute de Fisica 
Universidade Federal do Rio Grande do Sul 
CP. 15051 - 91501-970 - Porto Alegre - RS - Brazil 
E-mail: ARENZ0N@IF1.UFRGS.BR 

February 1, 2008 

Abstract 

We present results for two different kinds of high order connections 
between neurons acting as corrections to the Hopfield model. Equilib- 
rium properties are analyzed using the replica mean-field theory and 
compared with numerical simulations. An optimal learning algorithm 
for fourth order connections is given that improves the storage capac- 
ity without increasing the weight of the higher order term. While the 
behavior of one of the models qualitatively resembles the original Hop- 
field one, the other presents a new and very rich behavior: depending 
on the strength of the fourth order connections and the temperature, 
the system presents two distinct retrieval regions separated by a gap, 
as well as several phase transitions. Also, the spin glass states seems 
to disappear above a certain value of the load parameter a, ag. 



PACS: 87.10+e — 75.10Hk 



1 



1 INTRODUCTION 



Synapses connecting more than two neurons have been introduced in at- 
tempt to both improve the storage capacity of existing models and to 
be a simulacrum of synapses existing in real brains (see |l| and references 
therein). Biologically, the idea of multisynapses has a strong motivation 
[|^]: axon-axon-dendrite connections, for instance, are relatively common in 
real nervous systems and can be described as third order synapses and even 
more intricate connections, involving more than two axons, may also exist 
in the brain. However, since second order synapses are highly dominant, 
higher order terms should be considered as corrections. As stressed in ref.[0], 
this feature may play an essential role in the functioning of central nervous 
systems of superior vertebrate organisms. Moreover, when some pairwise 
connections are close enough they may interact somehow, and that can also 
be considered as high order synapses (although in this case there are only 
interactions of even order). 

Networks with N infinite range interacting Ising spins {Si = ±1) associ- 
ated to the state of the neurons (active or inactive) are considered to describe 
learning, storage, and retrieval of information. Possible configurations of the 
network are represented by A^- dimensional vectors S = {Si, . . . , S^) and the 
stored information (memories) is associated to P of these states, denoted by 
the vectors /x = 1, . . . , P. The network load is measured by the parame- 
ter a, usually P/N, and the performance of a model for attractor networks 
can be measured by its storage capacity and its ability in recalling the stored 
patterns, in particular, the maximum allowed noise in an initial configuration 
and the time needed by the network to evolve and stabilize at, or near, one 
of the P memories. 

Several works introduced multispin interactions by generalizing the Hop- 
field model and Hebb learning rule by a monomial of degree /c > 2 in 
the Ising spins [|, These models have been investigated, both analyti- 
cally and through computer simulations. Alternatively, a recently introduced 
model 0] simultaneously considers several orders of interactions, besides the 
second order Hopfield term. In this paper we present a truncated version 
(hereafter called Truncated model) of that model, and study the effect of 
the (weighted) first correction to the Hopfield term. We also investigate 
the effect of a Hopfield-like correction (hereafter called Generalized Hopfield 
model, GH) and compare both prescriptions. 
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The paper is organized as follows: section 2 defines the models and in 
section 3 analytical and numerical results are presented. In section 4 we give 
an optimal learning rule that highly increases the storage capacity and finally 
in section 5 we summarize and present our conclusions. 

2 THE MODELS 

2.1 The Generalized Hopfield Model 

In order to compare performances, we contemplate a straightforward gen- 
eralization of the Hopfield model (GH) by a polynomial of degree M that 
considers multispin interactions 0, ^, namely 

N 

^ = -yE^^E< . (1) 

where M is an integer (M > 2) and the overlap between the state of the 
network S and the pattern is given by 

^. = ^Eef5^ • (2) 

1=1 

Here we have high order terms as corrections (that do not need to be small) 
to the original second order Hopfield model. This system has a completely 
different static and dynamical behavior from the one that will be defined in 
the next subsection 0. Here, the higher order corrections do not qualitatively 
change the T = behavior of the Hopfield network although the (q;,T) 
phase diagram presents some new features. The Hopfield network has its 
performance determined by the nature of the local field on a given neuron 
5*^. This local field has two competing components, a signal term that tends 
to align the spin Si with a given pattern, and a noise term that has a random 
orientation. A correction to the second order Hopfield term may then act in 
two different ways: either it enhances the signal term and/or it decreases the 
noise one. Also, the joint analysis of both terms yields an estimate of the 
critical capacity of the net: the lowest order interaction {imin) in the energy 
function is the most relevant contribution for the cross talk noise from the 
high patterns implying that the maximum number of patterns that can be 
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embedded is 0{N^"^^" ^). Thus, the presence of the second order term imphes 
that the maximum number of patterns that can be stored is proportional to 
N §§. 

The learning rule, for any i >2, is: 

Ji.,ip+i) = Ji.sp)+j^^r^^'---^r ■ (3) 

Here one has the full symmetry of indices and all connections are symmetric. 
Remark that in a more general case, the weights could be considered different 
for each pattern, ee = ei{fi), generalizing the model studied by Viana 0. 

This model presents an overall behavior qualitatively similar to the stan- 
dard Hopfield model at T = 0. For a < ac, the retrieval quality is good 
(m ~ 1) and the size of the basins of attraction decreases with a. Also, 
if the initial state is out of the basin of attraction, the mean convergence 
time grows as the network increases, while inside the basins just one or two 
steps are enough |p, |T^. Underlying this similarity is the fact that the loss 
of retrieval abilities due to an overload of the network originates in the same 
mechanism: the noise term overwhelms the signal one when a > ac- 



2.2 The Truncated Model 

The complete energy function of a model previously proposed [Q, with all 
orders of interactions (up to 2P), for a network storing P patterns is 

E = Nf[{l-ml) . (4) 

This energy function is proportional to the product of the Hamming distances 
between the network state S and the patterns ^'^ and its inverses From 
eq.(^) it is clear that E{S) > (if 5 = for any /x, the equality holds). It 
means that, no matter how large a is, the patterns are always global minima 
of E. A complete discussion of the phase space landscape in the a ^ limit 
as well as simulation results for a 7^ can be found in refs.|Q, |^. 

The multi-interaction nature of equation (P becomes evident when it is 
displayed as 



E = N 



1 - L "^Mi + "^Mi"^M2 + • • • + (-1) rn^^...m 

Ml Ml<M2 /ii<...<Mp 



(5) 
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Notice that although the first non-trivial term is the Hopfield energy function, 
the higher order ones are different from any previous model because they 
contain mixed memory terms. Also, since we have the Hopfield term along 
with higher order ones, we expect that the number of patterns that can be 
stored is 0{N). We now define an energy function by neglecting the constant 
zeroth order one and considering the next M terms (M < P). This energy 
function, after introducing weights and renormalizing it by a factor 1/2, reads 

^ = fE(-if^2. E ■ (6) 

Equation defines a model that can be regarded as the Hopfield model 
{^ = 1) plus correction terms. Differently from the previous case, here one 
cannot have £2 = because the remaining terms are mixtures and no pattern 
can be retrieved only with them. In what follows we consider only the first 
correction to the Hopfield term (M = 2). Eq.(^ can then be rewritten as 
i£2e = + £^2d 

= — - E JijSiSj + 2 E •^ijkl^i^jSkSl . (7) 
i,j idtk,l 

The learning rule for the second order couplings Jij is the Hebb prescription 
1^, eq.@. The fourth order synapses J^ji^i, on the other hand, may be 



implemented through the following learning rule 



and when a new pattern is learned, 

JiAp + 1) = + jf2UP)ek^'ir^' , (9) 

where ^'^"^^ is the (P + l)th pattern to be taught to the net. Eq.(^) may 
be regarded as the multisynapses described in the introduction: the last 
term being the action of two axons upon a binary synapse. Notice that for 
P = 1 this model is equivalent to the standard Hopfield one since the fourth 
order synapses do not exist. Remark that while the pairwise connections 
are symmetric, i.e. Jij = Jji, the fourth order ones do not present full 
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symmetry under all possible indices interchange (only i ^ j and k ^ I). 
These couplings can be symmetrized if they are rewritten as 



1 



J'ijkl — 2 (Jijkl + Jljki + Jkjil) (10) 

and hence, the energy is a Lyapunov function for the dynamics 



S,{t + 1) = sgn 5: J,,5,(t) - e E J:^,,S,{t)S,{t)Si{t) (11) 

\ j 3,k,l J 

and the use of the Statistical Mechanics tools is allowed. One should also 
notice the importance of the self couplings here. For instance, the couplings 
Juki and Jijkk give rise to contributions to the energy of the same order as 
the couplings Jjj, what does not happen in the generalized Hopfield model. 
This can be easily seen if one rewrites the couplings as 

Jjjki = l^JijJki — ^Jijki ■ (12) 
Then, for the self couplings mentioned above: 

Juki = 2iV"^'^' "~ 2iV^'^'^' ~ 2iV"^^' ' 

where the factor A^~^ is compensated by a sum over sites (i) in (0). Notice 
that the contribution from J^^i is neglectable. 

A previous numerical simulation for e = 1 shows a continuous tran- 
sition from the retrieval phase to a non-retrieval one at T = 0. The basins 
of attraction seem to be large and a-independent: ml (the minimum initial 
overlap that allows retrieval in the thermodynamic limit) is ~ 0.1 for all 
a < ac- The mean convergence time ( T ) (the average number of whole 
network updatings required to reach a stable state) increases with a and, 
for non small values of a, does not depend on the initial overlap rrio- Long 
convergence times and large dispersions on its average values are often re- 
lated to the irregularity of phase space around the memories |10| (existence 
of spurious states). However, this interpretation is only valid when the stored 
patterns are at, or very near, the bottom of the basins of attraction (m ~ 1). 
Since the transition here is continuous, for large values of a this is no longer 
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true: initial states with overlap nio with the chosen pattern lay on the surface 
of a hyper-sphere of radius proportional to 1 — rrio centered at the pattern. 
On average, the distance in the phase space from this surface to the bottom 
of the basin of attraction is the same as the distance from the memory to 
the energy minimum, independently of mg. Consequently, the convergence 
time does not depend on ttIo. For small values of a, on the other hand, the 
retrieval quality is good (m ^ 1) and a small decrease in the convergence 
time is observed as nio increases 0. 



3 MEAN FIELD THEORY 

3.1 The Free Energy and Saddle-Point Equations 

The mean field analysis is performed by means of the standard techniques 
introduced by Amit et a/.[]I2|. In the GH model, the cross-talk noise is 
governed by the second order term while the fourth order one contributes 
only to the signal term. On the other hand, in the Truncated model there is a 
contribution from the higher order terms (for instance, from the self couplings 
mentioned in the previous section) to the overall noise due to microscopically 
overlapping patterns. Up to the fourth order {e2t = 5u + e52t), the Truncated 
energy function can be rewritten as 

and the free energy per neuron can be obtained using the replica trick. After 
assuming that the replicas are symmetric and taking the limit of zero replicas 
we get 

, r ^/ N / M 1 anil — ev) 

+ ^ln|l-;3(l-.,)(l-,)|-- ^_^^f_j;_^^ 

+ ^apr{l - g) - ^ (( In 2 cosh p \t.$, + ^/^z] )) , (15) 
2 p 



where y is introduced to linearize the last term in eq.(|lj) and the variables t 



q, and r are usually introduced to linearize the non-linear terms. The symbol 
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(( )) stands for two averages: over the finite number of patterns that may 
condense and over the gaussian variable 2;, related to the infinite number of 
microscopically overlapping memories. The saddle point equations are 



m 

1 



arz 



^ tanh (3 {$,.t + \/arz 
1 - ey)m^ + emj 
|(tanh^/5(t.^ + 

P{l-ey){l-q)_ 



[l-P(l-ey){l-q)f 
Analogously, for the GH model the free energy is 



(16) 
(17) 
(18) 

(19) 
(20) 



-1 M -I 

^ E E + E t.^. + ^ - - - 



H 



aq 



+ -a/5r(l — q) 



1 



2(3 

In 2 cosh /3 [t.^ 



arz 



-P{l-q) 
£2 7^0 (?1) 



and the equations to be solved are the same as in the original Hopfield model 
1^ except for t^. The equations for m and q are the same as eqs.([T6|) and 



TSD and and r read (here we do not have y) 



1 ^' 

[l-P{l-q)Y 



,£27^0 



(22) 
(23) 



The sets of coupled nonlinear equations given by eqs. ([T^)-(^D|) and 
(p!6D,(p!8D, (|2^)-(p3D, are numerically solved in the next subsections in the 
case where the network presents a macroscopic overlap m with one of the 
memories (m^ = m5i^). Since we are mainly interested in the properties of 
the Truncated model, results for the GH model will be given when they differ 
from the original Hopfield model or when comparing both models. 
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3.2 The T = 



Limit 



The T = (/? 
model is 



oo) limit of the saddle point equations of the Truncated 



m 
t 
r 

y 
c 



erf 



[1 — ey)m + errv' 
l-C{l-ey) 



m + 



ar 



anr 



exp 



2ar 



(24) 
(25) 
(26) 
(27) 

(28) 



where q —>■ 1 and C = — g). These equations are numerically solved for 
several values of a and e and the results are compared with the simulation 
whose details are presented later. We found essentially two different regimes, 
depending on the value of e. For large e (~ 0.5) the overlap m decreases 
with a, going monotonically from 1 down to zero at a^(e), signalling a 
second order phase transition. As e decreases (~ 0.36), the overlap presents 
a local minimum, before finally going to zero at a'^{e). If e < Sc — 0.3587, 
the minimum yields a gap separating two retrieval regions. These results 
are illustrated in figs.|l] and |^ and summarized in the T = phase diagram 
of fig.||. The critical values af{e) are associated to second order transitions 
at T = 0. The gap is delimited by the lines a~{e) and a'^{e), which meet 
at an endpoint nearby (0.3543,0.7784) (see fig.|): the left border is always 
given by a'^{e) while the right one is defined by a~{e) (second order) for e < 
0.3543 and by a'^{e) (first order) for 0.3543 < e < 0.3587. As e approaches 
zero, the gap width, a~ — a[, goes to infinity and for negative values of 
e only the first order transition associated to a'^{e) is present (see fig.^]). 
Near Ec the behavior of the model is very complex near the gap and the 
effects of replica symmetry breaking (RSB) should be taken into account 
in order to decide what kind of critical points actually exists. Apparently 
(in the replica symmetry approximation) there is a critical point at {e, a) ~ 
(0.3492, 0.833) where the first order line ceases to exist, and an endpoint 
nearby (0.3543,0.7784) where both lines (first and second order) cross (see 
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fig.^. It must be emphasized that the retrieval solution is unstable in this 
limit (T = 0), as can be seen by the negative value of the entropy 



2 9/ 



The numerical values of the entropy are larger than in the Hopfield model, 
possibly indicating that the effects of RSB here are stronger. There is a 
range of e, 0.3492 < e < 0.3543 where the system suffers up to four phase 
transitions (two first and two second order) since, for a fixed value of e, 
as a is increased it crosses twice the line a'^. The richness of this phase 
diagram deserves a separate study including the analysis of the stability of 
the solutions as well the effects of RSB, mainly in this low temperature 
region, but it is beyond the scope of this paper. The peak in the second 
retrieval region occurs because the noise in the Hopfield term of the local 
field in eq.(|Tll) is completely compensated by the noise generated by self- 
couplings in the fourth order term when e = (see discussion below). 
For negative values this never happens since 1 + is always positive {y is 
positive defined). Differently from the GH model, here there is no cut-off in 
a'^ as e decreases, although — 0. 

The points where m continuously approaches zero {m \a — a^{e)\^^'^ 
as a — >• Oi^{e)) , ^^(e), are obtained by expanding equations (|2^)-(p8D for 



small m: 



a^{e)=\^±\i- \ . (30) 




For e > Ec there is only one transition at a^(e), as can be seen in the T = 
phase diagram (fig.^, and the critical value of a is a decreasing function of 
6. Two different retrieval phases appear for e < Ec and when the right border 
is second order, given by a", the width of the second retrieval region is 




A'^at{e)-a;ie)=AJ- (31) 

V en 

and the gap A between the first and second retrieval regions is 

A = a-ie)-a',{e) , (32) 

where a^(e) is the point of the first order transition in the first region. As e — *• 
0, A goes to infinity as and A' as e"^/^. Thus, when the Truncated model 
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recovers the Hopfield model {e — >■ 0), the location of the second retrieval 
region in the a- axis goes to infinity and a'^ — >■ 0.138. 

When £ — > oo, the solution for m can be obtained as a function of a and 
it reads 




m = eri \ \l - In — | (33) 

and 

a+(oo) = - . (34) 

IT 

In this limit case, the system behaves suitably as an associative device (m ~ 
1) only if the number of embedded patterns is finite (a = 0). 

It is also possible to obtain the location of the maximum value of m in 
the second retrieval region: the values of a that allow m = 1 in (p4D-(p^) are 
a = and 

a = ^—^ . (35) 

e 

At the peak the noise from the high patterns (measured by r) goes to zero 
because e = (and since y is positive defined, this only happens for positive 
e). Thus, the contribution from the second order term does not contribute 
in the peak and one shall take into account the next (fourth order) term. 
This allows us to introduce an optimal learning rule in the next section by 
choosing the weight e as the value that satisfies (|35|) . 

The above results can be qualitatively understood through a signal to 
noise analysis. The local field acting upon the i-th neuron when the system 
is recalling the first pattern is 

^ (1 - ae)^l + (1 - «^ - ^)^ E E , (36) 

where we considered the contributions from both two and four neuron cou- 
plings in eq.(^. The increase in e has then two effects: it acts both on the 
signal and noise terms. Depending on the value of e, an increase in a may 
either suppress the signal or the noise term. When e is small enough, the sum 
in eq. (^6]) may overwhelm the signal term and we have a Hopfield-like mech- 
anism of suppressing retrieval abilities. In this case the overall behavior is 
qualitatively similar to the second order Hopfield model, as in the GH model. 
This also corresponds to the first retrieval region for e < Ec- On the other 
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hand, when e > Ec, together with a decrease in the noise term, one can also 
observe a detectable decrease in the signal term with increasing a when in 
the second retrieval region. Now the mechanism of loss of retrieval ability is 
not due to an overwhelming noise term, but to the annihilation of the signal 
one. In other words, the minimum of energy is not shifted from the pattern 
due to the noise, but the energy barriers around the minima are decreased. 
A similar effect is observed in the complete model 0. This explains the 
qualitatively diverse behavior of the network in this region. Also, the sec- 
ond (noise) term has zero mean and variance equal to = a{l — ae — e)"^ . 
An important result is that the variance is zero if a = or a = {1 — e)/e. 
These are the points where the retrieval is perfect (m = 1); higher order 
noise terms, which we did not write, have dispersions at least 0{N~^) times 
smaller. This will be considered in a next section when an optimal learning 
rule will be detailed. Also, as for increasing values of e the coefficient of the 
signal term, 1 — ae, changes signal for decreasing values of a, explaining why 
the critical value of a decreases. 

To verify these results we performed zero temperature simulations with 
network sizes up to 512 neurons. The steps are the following: one of the 
embedded memories is chosen as the initial state and a spin is (sequentially) 
flipped whenever this lowers the system energy. This procedure is repeated 
until a stable flx point is reached and the flnal overlap m with the chosen 
memory is measured. The averages were taken over 5 different sets of patterns 
and the number of runs in each set were 200 and the sizes were N = 128, 256 
and 512 neurons. In flg.|^ we can see the flnal overlap m versus a for e = 0.3, 
clearly showing the existence of the gap. The results for e = 1 can be found 
in ref.0, but a more extensive study of other values of e as well dynamics 
differences between the flrst and second regions is in course. As in other 
models, when compared with the mean fleld calculations, the simulation 
yields some discrepancies (mainly near the transitions), what is in part due 
to the replica symmetry instability at T = and supported by the negative 
entropy at zero temperature obtained with the replica symmetry ansatz. Also 
remark that the remanent magnetization above for e = 0.3, m ~ 0.1, is 
lower than the one found for the Hopfleld and GH models (m ~ 0.2) and, as 
can be seen in flg.|^, is a decreasing function of e. 

A flnal remark concerning the free energy of the Truncated model at 
T = 0: the retrieval states are global minima for all values of a below in 
the Truncated model if e > Ec- For e < Ec there is a range of a in the flrst 
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retrieval region where they are local minima (metastable): q;m < a < 
(in the second retrieval region they are always global minima). For e — > 0, 
OiM 0.05, as expected |[T2|. 

On the other hand, in the GH model whose equations at T = are (the 



equations for m and C are the same as (0) and (pSD , respectively) 

1 M 

t = -Y.^M-' (37) 

^ 11=2 

r = (l-C)-' , (38) 

the storage capacity is a monotonically crescent function of both M and 
Ei- Fig. I shows Oc versus e in the case ei = 621 + eSa, that is, when 
only second and fourth order terms are considered. The line q;m where the 
retrieval states become global minima and the value of m at the criticality, 
mc{e), are also plotted. When e ^ 00, the asymptotic value of mde) goes 
to mc(oo) = 0.918 and the critical value of a grows as ttc ~ what can be 
understood as follows. The bigger e is, the more important the fourth order 
term and the system capacity tends to be of order 0{N^) (attained in the 
absence of the second order term 0), that is, the asymptotic behavior of the 
maximum number of storable patterns goes as Pc ~ e'^N. The value of aM 
also goes as as e ^ 00. For negative values of e there is a cut-off: below 
^cut f^j^Qj-e is no retrieval {t, given by eq . (|37|) , is null). For instance, for the 
case Ee = 821 + e5u-, e'^'^^ = —2/k. As e — > e™* (from above), ^ and 
rric —>■ 1- In the general case we have 

M 

E^r*^ = , (39) 

i=2 

defining a hyper-plane in the space of the non-zero e^s. There is also a cut-off 
in aM, below which the memories are never global minima of the free energy. 
For instance, for the same case k = 4, e^* = 2/7r — 1 ~ —0.363. 

3.3 The T ^0 Case 

The GH model presents some universal features that do not depend on the 
particular values of ee and M. For instance, the line below which the SG 
states exist is 

Tg=l + y/^ , V£,,M , (40) 
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and £2 7^ 0. The phase diagram for si = 821 + 5u is presented in fig.|^ 
{e = 1 since the quahtative features do not change with e). There are three 
relevant hnes: T^, given by eg. (p]|) , signalhng the appearance of SG states; 
Tmi where the retrieval states (m 7^ 0) first appear and finally T^, where the 
retrieval states become global minima of the free energy. At a = 0, Tg and 
Tm do not meet, implying that there is a transition between the retrieval 
and paramagnetic phase also for small values of a. This phase diagram 
is similar to that for the Q-state Potts neural network model |[T^, due to a 
formal equivalence between multistates neurons and binary spins with diluted 



multispin interactions |]T5 



The SG phase is reentrant and hence the maximum possible value of a 
is not at T = (etc — 1-556) but at a non zero value of T: a™"^ ~ 1.566 
for T ~ 0.126. The degree of reentrance depends on e: in the Hopfield limit 
{si = 62£) it is very small [0. As a consequence, a small amount of noise 
improves the storage capacity of the system. This can also be observed in the 
behavior of m with T near the reentrant region: the overlap first increases 
before decreasing, indicating a small improvement with thermal noise. These 
effects may be an artifact of the replica symmetry (supposed to be stronger 
near the reentrant region): a^""^ is believed to be a lower bound for the 
actual T = critical capacity obtained when the replica symmetry is broken 
(the reentrant phase would then disappear). In other words, a^^^ > a^""^, 
what is supported by numerical simulations in the case £ = 10. 

The phase diagram for the Truncated model with the lines Tm and Tc 
for e = 1 is shown in figure In figure ^ the line Tm is shown for two 
values of e: 0.36 and 0.3. For e > Ec, the retrieval states are always global 
minima of the free energy for low temperatures, although for e < Ec they 
may become local minima in the first retrieval region. For all values of e 
and a = 0, Tm = Tg = Tc = I. Nevertheless, for a 7^ 0, the qualitative 
features of the phase diagram depend on e, differently from the GH model. 
When e = 1 (fig. Tm decreases monotonically and there is no reentrant 
phase. As e decreases, the Tm line develops a minimum, implying that for 
some range of temperatures there are two retrieval regions (e.g., for e = 0.36, 
0.086 < T < 0.202), as shown in fig.|^. For even smaller e, the minimum 
becomes a gap (fig.^ and the first retrieval region is reentrant. Also, the 
introduction of thermal noise in the system decreases the value of a at which 
m has a maximum in the second retrieval region (fig.p!oD. At T = there are 
three phase transitions lines in the e x a plane: one first order, a'^ and two 
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second order ones (fig. 0). For T 7^ 0, all three phase transitions lines 
are first order. 

The temperature at which the SG solution (m = 0, g 7^ 0) continuously 
disappear (g — > 0) is 



TJa.e] 



;i-a£)(l + v^)-l] . (41) 

These lines are shown in figure |1T] for several values of e. From the above 
equation we can see that Tg(0,£:) = 1 for all values of e and for e = one 
recovers the Hopfield line Tg(a, 0) = 1 + y/a. The hne T^, for e 7^ 0, goes to 
zero at (see inset of fig.pl) 




a, = - 1-J1 + - . (42) 



The same happens in the pseudo-inverse model [|To|, although in that case 
the transition is discontinuous and Ug ~ 0.363. Although there is a solution 
with g = for the SG phase along the line a = e~^, the critical line ag, 



where the SG phase disappears, is such that ag < (see fig.ITT], inset). 

We performed a numerical simulation to verify whether there is or not 
a value of a above which the number of spurious states suffers a sudden 
decreasing. Differently from the simulation presented in the previous section, 
here the initial state is chosen at random and, after the system reached a 
stable state, one searches for the memory with the maximum overlap with 
that state. The idea is that if the system starts from random positions, the 
final state is either one of the embedded memories or some spurious state, if 
any. In order to quantify the results we define the following quantity: 

M = {{ mm„=i - rriris )) , (43) 

where rriris is the mean final overlap when the initial state is random and 
mm^=i is the mean final overlap when the initial state is one of the memories. 
The symbol (( )) stands for average over several sets of patterns and initial 
states. Notice that this quantity is proportional to the fractional "occupa- 
tion" of the phase space by the basins of spurious states. Also, if a > ac both 
lines should merge since there is no memory retrieval, that is, this quantity 
yields relevant information only for a < ac- The simulation results for the 
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Truncated model with e = 0.5 are presented in fig. |T2|: A4 = for a = 
and for a roughly above 0.5. This result supports the analytical prediction 
of the disappearance of the SG states above a given value of a. However, the 
agreement is only quahtative since the predicted value of ag/ac for e = 0.5, 
is 0.2. We expect this difference to be an artifact of replica symmetry insta- 
bility at low temperatures. For sake of comparison, M is also shown for the 
Hopfield and GH models, figs. 13 and 14, respectively. In both cases, below 
etc, -M seems to increase both with a and A^. For the Hopfield model, fig. 
p!3| , the Ai dependence on a is linear, the exponent depending on A^. For 
e = 1, Ai attains a minimum for small a and grows for increasing values of 
both a and N. Thus, except for the Truncated model, the spurious states 
dominate the phase space landscape, either by its increasing number or by 
increasing basins of attraction. 



4 THE OPTIMAL LEARNING RULE 

The second retrieval region in the Truncated model originates in the compe- 
tition between the noises of the Hopfield term and the fourth order couplings. 
The value of a at which this noise term is completely compensated is asso- 
ciated to a maximum in the retrieval quality and depends on the value of e. 
When e is allowed to vary with a, more specifically, if 

e = Eoptia) = , (44) 



the system always works in the minimum noise region (see eq. (p6D ) : the 
retrieval quality is maximum and the capacity of the network is greatly 
enhanced. To estimate the limit in the load parameter of such a model, 
we investigated higher order noise terms which are of the order ea/N^^"^ ~ 
PN-^'^/{P + N), that is, in the thermodynamical limit, this term always 
goes to zero regardless the asymptotic behavior of P. On the other hand, the 
signal term is also modified by the fourth order couplings. The thermody- 
namical limit for the signal term 1 — ae when e = Sopt and P ~ A^^ is given 
by 



1 


, if < 1 


1/2 


, if A; = 1 





, if A; > 1 



jim (1 - asopt) = <{ 1/2 , if A; = 1 , (45) 
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that is, P ~ A^^ (finite a) to guarantee that the signal term is not zero. 
In this optimal learning rule, the load limit of the network originates in the 
annihilation of the signal term and not in a decrease in the signal to noise 
ratio. 

A final remark about this optimal rule is that when a increases, provid- 
ing it remains finite, Eopt decreases: the contribution from the fourth order 
term decreases and the changes in the fourth order couplings, see eq.(|]), are 
smaller. In other words, the more the network knows, the easier it is to learn 
new patterns. Also, from the biological point of view, the synapses of order 
higher than two are corrections to the second order ones, what is indeed the 
case here, since the role played for the fourth order corrections depends on 
the value of e and the network presents larger load capacities with decreas- 
ing e. The GH model storage capacity, on the other hand, is an increasing 
function of the weight e. 

These results are valid for T = 0, and the optimal behavior is expected 
to hold at low temperatures. It could be interesting to check what happens 
for higher values of T. 

5 CONCLUSIONS 

We compared the effect of two different fourth order corrections to the stan- 
dard Hopfield model by considering two learning rules and investigated their 
behavior calculating the retrieval capabilities with and without thermal noise. 
The phase diagrams for both models were presented, and the strikingly dif- 
ferent behaviors presented by them come from the nature of the fourth order 
connections, that may or not present mixed memory terms. 

The original non-truncated model [^, ^ showed an improved performance 
due to a strong reduction of spurious states, with the consequent enhancing 
of the load capacity. The limit of the storage capacity of the network orig- 
inates in the lowering of the energy barriers between memories and not in 
the dislocation of the energy minima from the patterns; the retrieval is then 
always perfect at T = 0. However, the order of the couplings (and their 
number) increases with P such that the ratio of information per synapse 
decreases. 

Here, we introduced a truncated model that shows a very rich behavior, 
summarized by the many phase diagrams presented in previous sections. 
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Besides several phase transitions, either first or second order, the system 
may present a gap separating two distinct retrieval phases. The first one 
recovers the behavior of the standard Hopfield model in the e ^ limit. On 
the other hand, in the second retrieval region an increase in the load of the 
network acts lowering the signal term. Thus, the limit in the capacity in this 
region is then due to the effect of lowering the energy barriers, differently 
from the Hopfield model. Possibly the SG phase disappearing for a > ag is 
another consequence of suppressing the noise: there is no detectable spurious 
states, what is supported by the numerical simulation. When the gap is not 
present, there is only one retrieval region that is more Hopfield-like for small 
a but presents a T = second order transition when losing its retrieval 
abilities. The second region, when exists, presents a maximum in the curve 
m versus a for e = {1 + a)~^ . Hence, in general, when a network is designed 
to work in a given range of storage (provided a is finite) , it is always possible 
to choose some e that optimizes the retrieval. In this case, similarly as the 
original model, there is no second order noise terms (for a convenient weight 
e) and the load capacity is limited by the annihilation of the signal term. 
However, other quantities, for instance, the size of the basins of attraction 
and/or the convergence time, still deserve further investigations. 

The effect of the sixth order term in the expansion eq.(^ may also be 
considered. However, we do not expect new effects to appear but for an 
increase in the storage capacity (or maybe more than one gap) and there 
may exist at least one pair {ef*, eg^*) that would improve the capacity of 
the model. Furthermore, the situation when only high order connections are 
diluted is interesting from the biological point of view, but the dynamics is 
hard to treat mathematically p, |1^ . A version of the model in which dilution 
is present in both terms is being presently studied. 

The stability of the solutions together with RSB effects should be studied, 
mainly in the T = limit that surprisingly showed a very rich phase diagram. 
An interesting problem is to investigate whether the critical point and the 
endpoint merge or not in a tricritical one when the replica symmetry is broken 
and then to obtain the critical exponents near those points. 

At last, we should point out that these results still have an unclear bi- 
ological relevancy, although they are interesting per se. Maybe other fields 
that use spin Hamiltonians may be benefitted by using this model. As an ex- 
ample, one can use it as a fitness function in theoretical population genetics 
[0 , where the gap might stand for some constraints that cannot be satisfied 
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by any species in that environment or for some forbidden genetical traits. 
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Figure Captions 



Figure 1: Overlap versus a for the Truncated model and e = 0.3,0.36 and 
0.5. When £ — > 0+ then a'^ — > 0.138 and af oo, recovering the original 
Hopfield model. 



Figure 2: Overlap versus a for the Truncated model for e = 1,2,5 and oo. 
The retrieval quality decreases for increasing values of e and as £ oo, 
etc 2/7r. The circles are the results for the numerical simulation using 
= 512 (see text). 



Figure 3: The T = phase diagram for the Truncated model. For e > £c — 
0.3587 the gap A = a~ — a'^ disappears while for e — » 0, the gap A — oo as 
e~^. The lines Q:f{e) are second order while a'^^e) is first order. The dashed 
line {Sopt) is the value of a 7^ where the peak (m = 1) occurs. Inset: the 
values of a at the first order transition {a'^) for £ < 0. 
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Figure 4: Region near the point where the gap appears. The dashed is 
a second order hne while the sohd one stands for first order. The hne a!^ 
ends in a critical point at £ ~ 0.3492 and both lines cross at an endpoint 
at e ~ 0.3543. Inset: overlap versus a near the critical point showing the 
second first order transition for e = 0.3493, 0.3492 and 0.3491 (from left to 
right). 

Figure 5: Overlap versus a for the Truncated model and e = 0.3. The full 
curve is the T = solution of eqs. (p^)- (|28|) while the points are obtained 
through numerical simulation (see text). Notice the perfect retrieval at a = 
and a = (1 — £:)/5 ~ 2.33. 

Figure 6: Critical values of a versus e at T = for the GH model. The 
dashed is the line below which the memories are global minima of the free 
energy, a^/ . The overlap at the criticality, mc{e), is shown in the inset and its 
asymptotic value, mdoo), is 0.918. There is a cut-off for negative values of 
e (in this particular case e*^"* = —0.5) where — and 1 as e ^ e'^"* 

from above. There is another cut-off below which the memories are never 
global minima of the free energy: e^"* ~ —0.363. 

Figure 7: Phase diagram for the e = 1 GH model. Remark the strong 
reentrant behavior for both Tc and Tm- The line Tg is the same for all values 
of e and M. Notice that in this case there is a transition between the retrieval 
phase and the paramagnetic one for small a. 

Figure 8: Phase diagram for the Truncated model with e = 1. Below the 
line Tm we have m ^ solutions (retrieval states) and these states become 
global minima of the free energy below (dashed). Notice that there is only 
one retrieval region and no reentrant phase. 

Figure 9: The line Tm for the Truncated model with e = 0.3 and 0.36. In the 
later case, it already shows the structure of two retrieval regions although 
they are still connected. There is a range of temperature in which we have 
two retrieval regions: 0.086 < T < 0.202. For e = 0.3 both regions separate 
and the first retrieval region is reentrant. 
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Figure 10: Overlap m versus a for £ = 0.36 at two different temperatures. 
The peak is shifted to the left by the thermal noise and the points that 
are continuous at T = are discontinuous at T 7^ 0. 



Figure 11: Transition temperature for the SG solutions, Tg, for several values 
of e in the Truncated model. As e ^ 0, one recovers the Hopfield line 
Tg = 1 + y/a. In the inset we show the points ag where = 0, as well as 
the hue (see text). 



Figure 12: Simulation results for M = ((177.^0=1 — rriris)) versus a/ctc for 
the Truncated model and £ = 0.5. The maximum increases with the size N 
of the network. For 0.5 < a < etc ~ 4.893, M. — Q signals the low occupancy 
of the phase space by the spurious states. 



Figure 13: Simulation results for M. = {{mmo=i — iTiris)) versus a/ac for 
the Hopfield model. Notice the linear behavior below etc. 



Figure 14: Simulation results for A4 = {{mm^=i — rriris )) versus a/ac for 
the GH model and e = 1. As in original Hopfield model, the phase space 
occupation by the spurious states grows with both a and A^, although for 
small values of a it seems to have a minimum. 
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